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ABSTRACT 

Twitter is one of the most used applications in the current 
Internet with more than 200M accounts created so far As 
other large-scale systems Twitter can obtain benefit by ex- 
ploiting the Locality effect existing among its users. In this 
paper we perform the first comprehensive study of the Local- 
ity effect of Twitter For this purpose we have collected the 
geographical location of around IM Twitter users and 16M 
of their followers. Our results demonstrate that language 
and cultural characteristics determine the level of Locality 
expected for different countries. Those countries with a dif- 
ferent language than English such as Brazil typically show a 
high intra-country Locality whereas those others where En- 
glish is official or co-official language suffer from an exter- 
nal Locality effect. This is, their users have a larger number 
of followers in US than within their same country. This is 
produced by two reasons: first, US is the dominant country 
in Twitter counting with around half of the users, and sec- 
ond, these countries share a common language and cultural 
characteristics with US. 

1. INTRODUCTION 

Twitter [1] is a microbloging system created in 2006 
by Jack Dorsey and Biz Stone. It has rapidly attracted 
a large number of users and become one of the most suc- 
cessful platforms for both social interactions and infor- 
mation diffusion. Twitter currently counts with around 
200 millions of users and more than 140 millions tweets 
are uploaded every day to the system. In Twitter a user 
can post text messages of up to 140 characters named 
tweets. Furthermore, a given user, e.g. Bob, registered 
in the system can follow any other user in the system, 
e.g. Alice. We then refer to Bob as an Alice's follower 
and Alice's as a Bob's friend. This friend-^ follower re- 
lationship (or link) allows Bob visualizing every tweet 
posted by AHce. 

The great success of Twitter has attracted the re- 
search community that has recently started to investi- 
gate different aspects of Twitter [1 [3 [i il [TOl [H] . In 
this paper we study the Locality effect in Twitter. This 
is, we look whether the followers of a given user are ge- 
ographically concentrated, and if so we identify where. 



Understanding the Locality phenomenon of large scale 
systems such as p2p systems [3 [6j [HI [14] or Online So- 
cial Networks (OSNs) [T^ is critical in order to improve 
the system design and the users performance while re- 
ducing the infrastructural and operational costs. Fur- 
thermore, it can also help on improving the design and 
performance of the data storage system [1^. This pa- 
per is, to the best of the authors knowledge, a first step 
to understand the Locality effect in Twitter. 

To conduct our study we have collected a real dataset 
including the geographical location of around IM Twit- 
ter users (or friends) and more than 16M followers as- 
sociated to them. Overall, our dataset includes more 
than lOOM of friend-^ follower links. 

We capture the Locality effect with two different met- 
rics: (i) the link level distance accounts the distance as- 
sociated to any friend-^ follower pair, whereas (ii) the 
user level distance captures a representative metric per 
user such us the median distance to its followers. There- 
fore, the main difference is that very popular users (with 
many followers) weight more at link level, while all users 
have the same influence (median distance) at user level. 

Using the described metrics we perform a two folk 
analysis. First, we look at the forest as a whole, and 
second we try to look at the forest from the trees. This 
is, initially we look Twitter as a whole and measure 
the locality happening at both the link level and the 
user level. The obtained results suggest that there is 
an important intra-country locality effect defined by 
short-distance links. However, we observe a surpris- 
ingly high percentage (> 25%) of cross-continent re- 
lationships, what may imply an external Locality phe- 
nomenon (this is a user with most of its followers located 
in a different country). Furthermore, we demonstrate 
that the level of traditional Locality (i.e. short-distance 
links) is higher at the user level than at the link level. 
This is caused because popular users with a larger num- 
ber of links are those more likely to experience the de- 
scribed external Locality. 

In the second part of the paper we go into the for- 
est to see if this global trends can be generally applied 
to every user. For this purpose we perform a country- 
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based analysis. We have selected the country criteria 
since: first, we observe a high level of intra-country Lo- 
cality and, second, it allows to accurately group users 
sharing a language and a culture (which obviously in- 
fluence the users relationships in Twitter) . Specifically, 
we analyze the 15 countries with a larger number of 
friends and followers in our dataset. The first result is 
the predominance of US that is responsible of around 
half of the friends, followers and links in our dataset. 
Then, the observed global trends are highly influenced 
by the Locality properties of US Twitter users. We also 
analyze for each of the 15 Top countries the locality at 
the link level. For this purpose, we compute the per- 
centage of friend-^ follower links of the Twitter users 
of a given country that stay local within the country, 
go to US and go to a different country than US. We 
found three different profiles. On the one hand, we 
have countries experiencing a quite high intra-country 
Locality effect such as Brazil that keep most of the con- 
nections local. These countries have typically a differ- 
ent official language than English and a strong and old 
culture. On the other hand, we found countries that 
suffer from the external Locality phenomenon at the 
link level. This is, the major portion of its links goes 
to US. These are those countries where English is of- 
ficial (or co-official) language. Finally, we observe a 
set of countries that equally share their links among 
those staying local, those going to US and those going 
to other countries. Afterwards, we perform the Local- 
ity analysis at the user level for 4 countries, US and the 
most important representative of each of the defined 
profiles. These are Brazil, UK and France respectively. 
We confirm that the intra-country Locality grows as 
follows among these countries Brazil>US>France>UK, 
which is coherent with the different profiles defined at 
the link level. Specifically, Brazil shows a surprisingly 
high intra-country Locality, indeed for most of Brazil- 
ian users (independently of the popularity) 80 to 90% 
of the followers are local. US shows a slightly lower 
intra-country Locality than Brazil and also at the user 
level has an important infiuence on the Locality trends 
observed for the whole Twitter system. In UK we ob- 
serve a clear bi-polarity, unpopular users show an im- 
portant level of intra-country Locality whereas popular 
users typically experience an external Locality and have 
most of its followers in the US. Finally, in France we 
observe a similar bi-polarity with a major bias towards 
intra-country Locality. 

In a nutshell. Twitter locality general trends depict 
a clear presence of intra-country Locality as well as a 
non-previously reported external Locality phenomenon. 
However, these trends cannot generally be applied since 
they are mostly influenced by the dominant presence of 
US in the Twitter demographics (50% of our dataset). 
Therefore, studying Locality in Twitter requires a per- 



country analysis that clearly demonstrates that lan- 
guage and cultural characteristics of a country definitely 
contribute to its Locality profile. Then, we can find 
countries such as Brazil with a 90% of intra-country lo- 
cality, and some others like Australia where around half 
of the friend-^follower links goes to US while only one 
quarter are established inside the country. 

The rest of the paper is organized as follows: Sec- 
tion [2] describes the used measurement methodology as 
well as the collected dataset. Sections |3] and |4] show the 
Locality analysis of Twitter at global and country lev- 
els respectively. Finally, Section [5] presents the related 
work and Section [6] concludes the paper. 

2. MEASUREMENT METHODOLOGY AND 
INFRASTRUCTURE 

Our main objective is collecting for a large num- 
ber of Twitter users its geographical location, the list 
of its followers and then, the geographical location of 
them. This information can be obtained from the Twit- 
ter REST API j2, . Specifically, when queried for a given 
user-id this API provides: (i) the user's profile infor- 
mation including a location-tag introduced by the user, 
(m) a list of followers user-ids and (Hi) other informa- 
tion such as the number of friends of the user and the 
number of tweets posted by the user so far. 

For our study we have analyzed a random set of 2M 
users obtained from [10] . For each one of these users we 
have collected the geographical location of the user, the 
number of friends, posted tweets and followers. Further- 
more we have also used the API to find the geograph- 
ical location of all the followers of each analyzed user. 
Unfortunately, Twitter limits the number of queries to 
be performed to 350 per hour per IP address/user-ic(3. 
Therefore, in order to speed up the data collection pro- 
cess we developed a master-slave distributed measure- 
ment architecture. This architecture counts with 1 mas- 
ter and 20 slaves located in different virtual machines 
on top of two physical machines. The master indicates 
to each slave the user-ids to be monitored. Further- 
more, each slave has its own IP address and user-id and 
can then perform 350 queries per hour to the Twitter 
API. Therefore, by using this distributed measurement 
architecture we are able to perform up to 7K queries per 
hour. Finally, the slaves store the collected information 
into a redundant centralized database. 

The collected user's location is the one provided by 
the user himself in his Twitter profile. Hence, it is not 
homogeneous and in some cases non-existing or mean- 
ingless. Our measurement tool filters those users that 
do not provide location information or provide a mean- 
ingless location. Furthermore we use the Yahoo geolo- 

^In the past Twitter gifted whitelist accounts which were al- 
lowed to perform up to 20K queries per hour. Unfortunately, 
these whitelist accounts are anymore available. 
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(a) Distribution (b) Dist. vs Pop. 

Figure 1: Distance Distribution and Distance 
vs popularity for user and link level Locality 

cation API 3 in order to homogenize the obtained data. 
For instance, aU those users indicating NY, NYC, New 
York City, etc are mapped into the same city, i.e. New 
York City. It is worth to mention that in Appendix [X] 
we demonstrate that the location-tag provided by the 
user in its profile accurately defines the geographical 
location of the user. 

We have crawled the Twitter API with the described 
software from 10-01-2011 until 28-04-2011. The result- 
ing dataset includes (after filtering it) 973K geolocated 
friends, 16. 5M of geolocated followers and more than 
lOOM of friend-^ follower relationships. 

3. GLOBAL LOCALITY IN TWITTER 

In this section we quantify the level of Locality of the 
friends^ J ollowers graph in Twitter. This is, we aim to 
answer the following question: Are followers typically 
located close to its friends?. For this purpose we use 
the two following metrics: 

-link level distance: the geographical distance for each 
individual friend-^ follower link in our dataset. 
-user level distance: the median geographical distance 
between a friend and its followers population. 

Figure 1(a) represents the CDF for both metrics. If 



we focus first on the link level distance, we observe 
that 35% of the links have an associated distance lower 
than 1000 km. This represents intra-country commu- 
nications for the most representative countries in our 
dataset (See Tab [1]). Furthermore, we observe that 
67% of the links are in a range of 4000 km, which 
means intra-country communications for big countries 
such as US or Brazil and intra-continent relationships 
for western Europe. However, there is still around 25% 
of long-distance links over 6500 km that represent cross- 
continent links. Therefore, we can conclude that Twit- 
ter is not a very highly localized system. It must be 
noted that the link level distance analyzes individual 
links and does not capture well the Locality at the user 
level, since popular users with millions of followers have 
a higher impact in the presented distribution than those 
unpopular users. The user level distance instead, avoids 
this effect. We observe that the distribution at the user 
level is more skewed than the previous one. Specifically, 
80% of the users have a typical distance to its followers 



Countr 
oun ry 


Friends 


Followers 




(num / %) 


(num / %) 


US 


528K / 54.24% 


7.37M: / 44.59% 


UK 


70. 6K / 7.27% 


987K / 5.98% 


BR 


61. 7K / 6.34% 


I.SIM / 10.94% 


CA 


39. 4K / 4.05% 


565K / 3.42% 


DE 


21. 7K / 2.23% 


331K / 2.00% 


AU 


20. 3K / 2.09% 


232K / 1.40% 




18. 8K / 1.93% 


442K / 2.67% 


NL 


14, 9K / 1.53% 


334K / 2.02% 


ID 


12. IK / 1.24% 


S62K / 5.22% 


FR 


10. 8K / 1.11% 


232K / 1.41% 


BS 


8.7K / 0.89% 


277K / 1.68% 


IT 


7. IK / 0.73% 


159K / 0.96% 


JP 


6.9K / 0.71% 


192K / 1.16% 


IE 


6.5K / 0.67% 


95. 4K / 0.58% 


MX 


5.5K / 0.56% 


234K / 1.41% 


TOP 15 


833K / 85.60% 


13.37M / 85.44% 


ALL 


973K / 100% 


16.53M / 100% 



Table 1: Number of friends and followers of the 
Top 15 countries in our dataset 




Figure 2: Geographical distributions of the fol- 
lowers for the Top 15 countries (percentage of 
followers within the country, in US and in other 
countries) 

< 4000 km (i.e. intra-country or intra-continent links). 
Hence, the user level depicts a higher intra-country lo- 
cality than the link level. This suggests that popular 
users (i.e. those with a larger number of followers) are 
responsible for most of the long-distance links and has a 
typical distance to its followers larger than those unpop- 
ular users. In order to confirm this hypothesis we group 
the users by its popularitjH (i.e. number of followers) 
and for each group we calculate the median for user and 
link level distances. The results are depicted by Figure 
1(b) The figure validates the previous hypothesis, since 
we observe that the more popular a user is, the larger 
is also the distance to its followers population. 

In summary, we have demonstrated that Twitter is 
not a highly localized system at the link level since there 
is an important portion of long-distance relationships 
whereas the localization is more marked at the user 
level. Furthermore, we have seen that popularity clearly 
impacts the Locality level of the users. However, this 
global analysis is clearly influenced by the dominance 
of US that represents 50% of the friends, followers and 
links in our dataset (See Tabled]). Therefore, in the rest 



We group the users in the following popularity buck- 
ets as function of the number of followers: [1-50], [51- 
100] , [100-500] , [501-1000] , [1001-5000] , [5001-10000] , [10001- 
50000], [50001-100000], [100001-500000] and a last bucket 
including all those users having > 500K followers. 
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of the paper we will deepen and broad the study by ana- 
lyzing geo-political, cultural and language aspects in or- 
der to answer the following questions: Are the reported 
global observations valid for every country? What are 
the causes of the observed distribution of intra- country, 
intra- continent and cross-continent relationships?. 

4. COUNTRY LOCALITY IN TWITTER 

In this section we group the friends in our dataset by 
country. We have selected the country criteria since it 
allows to accurately group those friends having a close 
geographical location, a similar cultural profile and the 
same language. We first study the demographics of our 
dataset, and later perform a country-based analysis of 
link level and user level Locality. 

4.1 Twitter demographics 

In order to study the demographics of our dataset 
we select the 15 countries contributing a larger number 
of friends. The detailed demographic numbers of each 
one of these 15 countries are summarized in Table [TJ 
Note that overall these 15 countries are responsible for 
around 90% of our dataset. First, as already stated, we 
observe that US is the predominant country in Twit- 
ter responsible for around half of the friends, followers 
and links in our dataset. Furthermore, from the lan- 
guage perspective we differentiate two profiles. On the 
one hand, we have those countries whose official (or co- 
official) language is the English such as US, Canada, 
UK, Ireland, India and Australia. On the other hand, 
we find those countries with a different official language 
than English such as Brazil, Spain, Germany, France, 
Italy, Indonesia, Japan and the Netherlands. Finally, 
it is worth to note the presence of developing countries 
such as Brazil, India and Mexico in the list. This is 
mainly due to the high population of these countries 
that eases to contribute a large number of users but 
also indicates the interest of their population on new 
social ways of communication such as Twitter. 

Once we know the basic demographics of our dataset, 
our second aim is understanding what is the level of 
intra-country Locality and inter-country interaction in 
Twitter at link and user levels. 

4.2 Link-based Analysis 

For each one of the Top 15 countries we compute the 
percentage of links originated in the country that: (i) 
remains within the country, (m) goes to US and, {Hi) 
goes to a different country than US. Figure [2] depicts 
the obtained results. As expected, the observed global 
Locality trends do not apply to every country and are 
mostly influenced by US Locality properties. Based on 
our observations we can distinguish 4 different profiles: 
US: due to its predominant role, it has to be consid- 
ered as a separated profile. It keeps more than a 70% 



of friend-^ follower relationships local. This is conse- 
quence of first, the predominance of US users in Twitter 
and second the strong local culture of US. 
Local profile: This is formed by a group of countries 
keeping local a higher number of links than those going 
to US or other countries. This is Local > US & Local 
> Other in Figure [H This profile includes Brazil, The 
Netherlands, Indonesia, Germany and Spain. All these 
countries have an official language different than En- 
glish. Furthermore, we found also some significant dif- 
ferences within the group. On the one extreme, Brazil is 
the country showing the highest Locality in our dataset 
with almost 80% of local links. This is because it is a 
big country with a strong local culture and the spoken 
language (Portuguese) is not very spread. Just other 
countries, not very representative in Twitter, such as 
Portugal use Portuguese. On a different corner, we 
have Spain whose local links are reduced to a 41%, 
since now many relations (around 30%) are established 
with South- America (common language) and other Eu- 
ropean countries (member of EU). 
Shared Locality profile: This is formed by those 
countries that distribute their friend-^ follower links equally 
among those that remain local, those that go to US 
and those that go to other countries. This profile in- 
cludes France, Mexico, Italy and Japan that are those 
countries where Twitter is less popular among the stud- 
ied ones. Therefore, at the individual link level, intra- 
country Locality has a strong dependency with the local 
popularity of Twitter, we expect a lower intra-country 
Locality happening in those countries where Twitter is 
less popular. 

English-based (external) Locality profile: This is 
formed by countries where English is the official or co- 
official language. These countries concentrate the ma- 
jor part of their links among them. Specifically, they 
experience an important external Locality with many 
friend-^ follower links going to US (e.g. 48% in the case 
of India and 47% in the case of Australia and Canada) . 
Furthermore, a lower but also important portion of links 
stay local (e.g. 34% in the case of UK and 31% in the 
case of Canada) and the rest are shared mainly with 
other English speaking countries and surrounding coun- 
tries. Therefore in this case, we observe that the combi- 
nation of language and demographics clearly influences 
the Locality associated to these countries. 

4.3 User-based Analysis 

The analysis performed so far has focused on under- 
standing the Locality at the link level. However, as we 
have seen in Section [3] this analysis may not capture 
well the details at the user level. Next, we thoroughly 
analyze Locality at the user level for the Top 15 coun- 
tries. Due to space constrains in this paper we provide 
the detailed analysis of one country per profile. Specif- 
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(a) US (b) UK (c) FR (d) BR 

Figure 3: Distance Distribution for user and link level Locality: US, UK, Prance and Brazil 
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(a) US (b) UK (c) FR (d) BR 

Figure 4: Distance vs Popularity for user and link level Locality: US, UK, France and Brazil 



ically, we consider the country with a larger number of 
users from each profile in our dataset. These are: US, 
Brazil, UK and France. 

For each one of the selected countries we repeat the 
analysis performed in Section[3] First, Figure[3]presents 
the distribution of link level and user level distances for 
each country. We confirm that in any case there is a 
higher Locality at the user level (curve more skewed) 
than at the link level. Let's now study separately each 
country. We observe that around 90% of US users have 
typically a distance to its followers < 4000km that de- 
fines the boundary of intra-country relationships for US. 
This intra-country locality effect is even more impres- 
sive in Brazil where 90% of the users have a user level 
distance < 2000km, when the limit of intra-country re- 
lationships is also about 4000km. This confirms the 
presence of a regional-based Locality in Brazil. If we 
analyze UK, it shows, at the user level, the bi-polarity 
described above between UK and US. However, con- 
trary to the link level (34% local, 42% US), the user level 
presents a 50% of local followers in the range of lOOOKm, 
while those ones located in US are now reduced to a 
37%. The second European country analyzed, France, 
has a 60% of its followers closer than 1000km. How- 
ever several neighbor countries such as Belgium, The 
Netherlands, Switzerland, Italy and Germany are lo- 
cated within this distance range. Hence, some por- 
tion of this 60% represents inter-country relationships 
rather than intra-country ones. Finally, around 1/3 of 
the french users have a typical distance to its followers 
between 5500 and 9500 km, which represents followers 
population in US. Then the described shared profile is 
also valid at the user level. 

Second, we analyze how the popularity affects the Lo- 
cality for the users of each one of the studied countries. 



We use the same methodology explained in Section |3l 
Figure |4] shows the obtained results. We observe sig- 
nificant differences among the countries. US shows an 
important correlation between popularity and Locality. 
The higher the popularity is the longer are the user's 
friend-^ follower links. The curves from US are similar 



to those observed for the whole system (See Fig 1(b) I. 
This is due to the preponderance of US users in Twit- 
ter, that makes the whole system showing a similar be- 
haviour to that observed in US. Contrary, Brazil users 
show a high intra-country Locality (median distances 
around 1000km) independently of its popularity (the 
curve is almost flat). Finally, we can observe a clearly 
denoted bi-polarity in UK and France. In UK those 
unpopular users with less than 100 followers present a 
clearly marked intra-country locality, whereas the pop- 
ular followers shows an external locality phenomenon 
with most of its followers in the US. In France we ob- 
serve the same phenomenon but the transition happens 
for 1000 followers. 

In order to gain more insight regarding the Locality 
at the user level we have calculated for each individual 
user of these four countries the percentage of links that: 
stay local within the country, goes to US and goes to 
a different country than US. Figure [5] depicts density 
diagrams in which the x-axis represents the percentage 
of friend-^ follower links that remain local and the y- 
axis represent the percentage of friend-^follower that 
goes either to US (See Subflgures [5la,b,c)) or another 
country (See Subfigures [5jd,e,f,g)) for each individual 
user. The results confirm and accurately quantify most 
of our previous observations. First we can clearly ob- 
serve that the intra-country locality grows in the follow- 
ing way: BR > US > FR > UK. Specifically, most of the 
Brazilian users have between a 80% and 100% of inter- 
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Figure 5: Percentage of Local followers vs Percentage of Followers in US (top) and other countries 
(bottom) per individual user: US, UK, France and Brazil 



nal followers, whereas in US we observe a slightly lower 
intra-country locality effect since US friends present a 
percentage of local followers between a 70% and 90%. 
Looking at the European countries, we observe a higher 
level of localization in France where the vast majority 
of users are concentrated between 40% and 80% of lo- 
cal followers, whereas the UK shows a less concentrated 
diagram covering from 20% to 80% of local followers. 
Furthermore, we observe how the remote followers of 
UK are more concentrated in US whereas French users 
tend to have more followers from other countries differ- 
ent than US. 



Locality in Large Scale Applications in the Inter- 
net: Locality is an important aspect to be considered 
in large scale applications. Having it into consideration 
may help to improve the system design and performance 
as has been demonstrated for the case of p2p file-sharing 
applications [3 \6[ [14] , p2p live-streaming applications 
[IT] or OSNs such as Facebook [13]. Although Twitter 
has significant different characteristics than p2p appli- 
cations and slightly different than Facebook, consider- 
ing the Locality effect in the system design may help 
to improve the performance and also the data storage 
procedure [12] of Twitter. 



5. RELATED WORK 

Twitter Measurements: Several previous works have 
exploited the different APIs offered by Twitter in order 
to collect data and describe different characteristic of 
the system. Krishnamurthy et al. [9] performed one 
of the initial measurement studies on Twitter collecting 
data of lOOK users. The authors report basic charac- 
teristics of the system such as the correlation between 
number of followers and friends of a given user or the 
distribution of Twitter users per continent. Afterwards 
Kwak et al. fTU' collected the complete friend— ^follower 
Twitter graph including 41.7 million users at the mo- 
ment of the study. The authors analyze the properties 
of the graph topology as well as some other social as- 
pects of Twitter such as the users infiuence. Also in the 
field of users infiuence Cha et al. [4] use a large dataset 
in order to analyze the dynamics of user influence across 
topic and time in Twitter. Finally, some other studies 
[7] IHl dl] focus on understanding social aspects of the 
Twitter system. However, to the best of our knowledge 
any of the previous studies looks at neither the location 
of a user's followers or the Locality effect in Twitter. 



6. CONCLUSION 

Understanding the Locality effect of Internet scale 
systems have direct implications into the improvement 
and performance of such a systems. This paper is, to 
the best of the authors knowledge, the first study re- 
garding the Locality phenomenon in Twitter. The ob- 
tained results demonstrate that different countries show 
different Locality profiles mostly infiuenced by the lan- 
guage and cultural characteristics of the country. On 
the one corner, we have countries with an extremely 
high intra-country Locality such as Brazil where most 
of its users keep local 80 to 90% of the followers. On 
the other extreme, we have countries experiencing an 
external Locality phenomenon such as Australia where 
50% of the friend-^ follower links goes to US while just 
25% keeps local within the country. Furthermore, we 
have seen that US is the dominant country in Twitter 
responsible for around half of the friends, followers and 
links in our dataset. This produces that the Locality 
trends observed when studying the whole Twitter sys- 
tem are highly infiuenced by the Locality profile of US 
Twitter users. 
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APPENDIX 

A. ACCURACY OF THE LOCATION-TAG 



In this paper we rely on the location-tag defined by 
the user in its Twitter profile to geolocate the user. 
Specifically, we are interested (for this paper) in accu- 
rately estimating the user's country. In this section, we 
validate the location-tag as a good approximation of the 
user's location. 

Twitter offers to its users the Tweet Geolocation Ser- 
vice. This service publishes along with the tweet the 
GPS coordinates from where the tweet was posted. We 
have collected data from 140K users that have the Tweet 
Geolocation Service active, have a meaningful location- 
tag defined in their Tweeter profile and have posted at 
least 5 tweets with associated GPS coordinates. For 
each one of these users we have computed the median 
geographical distance between the location specified in 
its Twitter profile and the GPS coordinates provided in 
its tweets. Figure [6] presents the CDF of the computed 
distance across the analyzed users. We can observe that 
most of the users (> 70%) typically post their tweets in 
a range of less than 100km from its specified location. 
Thus, we can conclude that in general the location-tag 
specified in the user's profile is a good estimator of the 
user location. Furthermore, we can consider it even 
more precise if we care about a correct mapping of the 
user to its country as we do in this paper. 
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Figure 6: Median distance between the user's 
location-tag and the user's tweets GPS coordi- 
nates 
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